using TDAfly, TDAfly.Preprocessing, TDAfly.TDA, TDAfly.Analysis
using Images: mosaicview
using Plots: plot, display, heatmap, scatter
using PersistenceDiagramsIn [1]:
1 Introduction
Falar sobre o dataset, TDA, etc.
2 Methods
All images are in the images/processed directory. For each image, we load it, apply a gaussian blur, crop and make it have 150 pixels of height. The blurring step is necessary to “glue” small holes in the figure and keep it connected.
In [1]:
paths = readdir("images/processed", join = true)
species = basename.(paths) .|> (x -> replace(x, ".png" => ""))
individuals = map(species) do specie
s = split(specie, " ")
s[1][1] * "-" * s[2]
end
wings = load_wing.(paths, blur = 1.3)
Xs = map(image_to_r2, wings);In [1]:
mosaicview(wings, ncol = 4, fillvalue=1)2.1 Vietoris-Rips filtration
We select 500 points from each image using a farthest point sample method
In [1]:
samples = map(Xs) do X
ids = farthest_points_sample(X, 500)
X[ids]
end;Progress: 0%|█ | ETA: 0:01:28 Progress: 100%|█████████████████████████████████████████| Time: 0:00:01 Progress: 40%|█████████████████ | ETA: 0:00:00 Progress: 85%|████████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 27%|████████████ | ETA: 0:00:00 Progress: 59%|████████████████████████ | ETA: 0:00:00 Progress: 90%|█████████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 27%|███████████ | ETA: 0:00:00 Progress: 57%|████████████████████████ | ETA: 0:00:00 Progress: 86%|████████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 32%|██████████████ | ETA: 0:00:00 Progress: 65%|███████████████████████████ | ETA: 0:00:00 Progress: 96%|████████████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 40%|█████████████████ | ETA: 0:00:00 Progress: 77%|████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 29%|████████████ | ETA: 0:00:00 Progress: 60%|█████████████████████████ | ETA: 0:00:00 Progress: 91%|██████████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 37%|████████████████ | ETA: 0:00:00 Progress: 73%|██████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 42%|██████████████████ | ETA: 0:00:00 Progress: 84%|███████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 42%|██████████████████ | ETA: 0:00:00 Progress: 81%|██████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 39%|█████████████████ | ETA: 0:00:00 Progress: 85%|███████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 46%|███████████████████ | ETA: 0:00:00 Progress: 92%|██████████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 51%|█████████████████████ | ETA: 0:00:00 Progress: 99%|█████████████████████████████████████████| ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 63%|██████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 53%|██████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00 Progress: 80%|█████████████████████████████████ | ETA: 0:00:00 Progress: 100%|█████████████████████████████████████████| Time: 0:00:00
and create an empty dictionary to store all computations
In [1]:
simple_rips_dc = Dict();We then calculate its persistence diagrams using the Vietoris-Rips filtration etc.
In [1]:
# get only the 1-dimensional PD
simple_rips_dc["PD"] = rips_pd.(samples, cutoff = 5, threshold = 200) .|> last;We create the 1-dimensional persistence image for each persistence diagram using 10x10 matrices
In [1]:
PI = PersistenceImage(simple_rips_dc["PD"], size = (10, 10))
simple_rips_dc["PI"] = PI.(simple_rips_dc["PD"]);2.1.1 Examples
Below are some examples of 1-dimensional barcodes, its persistence image and the original wing that generated it. Note: we are plotting the barcode using the birth and persistence.
In [1]:
# plot some images to see the barcodes
map([1, 4, 8, 10, 15]) do i
p = plot_wing_with_pd(simple_rips_dc["PD"][i], simple_rips_dc["PI"][i], samples[i], species[i])
display(p)
end;We now calculate the Euclidean distance between each persistence image (seen as a vector of \(\mathbb{R}^{10x10}\)) and plot its heatmap
In [1]:
simple_rips_dc["Distance_matrix"] = pairwise_distance(simple_rips_dc["PI"]);In [1]:
plot_heatmap(
simple_rips_dc["Distance_matrix"],
individuals,
"Distance matrix for Vietoris-Rips barcodes"
)2.2 Persistence Homology Transform
Now we will create several filtrations based on points and lines, etc.
We start with the point (0, 0). Its filtration is the following
In [1]:
A = wings[10] |> image_to_array;
f = dist_to_point(0, 0)
Af = modify_array(A, f)
heatmap(Af)with corresponding sublevel barcode as
In [1]:
point_pds = cubical_pd(Af, cutoff = 0.05)
plot_pd(point_pds)or, with persistence in the y-axis:
In [1]:
plot_pd(point_pds, persistence = true)Let’s see step-by-step of this filtration:
In [1]:
for tr ∈ reverse([0:0.1:1;])
X = findall_ids(>(tr), Af)
title = "threshold: $tr"
p = scatter(first.(X), last.(X), title = title)
display(p)
endDue to noise, some connected components are born in 0.2 and die only at 0. But the loops seems alright.
Let’s apply this idea to all images!
In [1]:
point_rips_dc = Dict()
f = dist_to_point(0, 0)
point_rips_dc["PD"] = map(wings) do wing
A = wing |> image_to_array;
Af = modify_array(A, f)
point_pds = cubical_pd(Af, cutoff = 0.05)[2]
end;
PI = PersistenceImage(point_rips_dc["PD"], size = (10, 10))
point_rips_dc["PI"] = PI.(point_rips_dc["PD"]);In [1]:
final_dc = Dict();
dics = [simple_rips_dc, point_rips_dc]
final_dc["PI"] = simple_rips_dc["PI"] + point_rips_dc["PI"];In [1]:
final_dc["Distance_matrix"] = pairwise_distance(final_dc["PI"]);In [1]:
plot_heatmap(
final_dc["Distance_matrix"],
individuals,
"Distance matrix for Vietoris-Rips + point-based barcodes"
)What happened?! Some wing has a big distance from everything else!
In [1]:
Xs = point_rips_dc["PI"]
id = findfirst(==("c-39"), individuals)
# id = 9
X = Xs[id]
heatmap(X)In [1]:
wings[id]In [1]:
heatmap(wings[id])The reason is that it has one small cycle right at the beggining of the filtration…
3 Draft…..
In [1]:
ps = map(reverse([0:0.1:0.9;])) do tr
X = findall_ids(>(tr), Af)
title = "threshold: $tr"
p = scatter(first.(X), last.(X), title = title)
# display(p)
end
plot(ps...)